# **MP4 Checkpoint 3 Deliverables**

**Progress Report, Roadmap** 

December 4th, 2020

Alex Vetsavong
Peter Kircher
Mohan Li

**Responsibilities:** What did people do for this checkpoint?

#### M-extension:

Mohan was able to fully implement a 32\*32 add-shift multiplier and divider, and was able to verify signed/unsigned instructions on both through testbenches. A redesign of the Wallace multiplier was also done, based on the TA's suggestions. However, that was not verified yet. He also made progress in adding connections to the datapath to integrate the multiplier and divider.

## L2 Cache and 4-way Cache:

Peter has implemented the L2 cache which took the place of the arbiter. He has also created a variant of the I-Cache and D-Cache which is 4-way associative. He has integrated with the option of excluding it from the design, since the 4-way is not functional.

## **Hardware prefetcher:**

Alex has implemented a hardware prefetcher for the I-Cache, as he determined it to be most beneficial for instructions to have the next line available and the D-Cache benefit did not outweigh the cost in memory bandwidth. Currently not providing any real performance benefit however, so design needs to be reevaluated to determine why.

## **Debug and integration:**

Each of the options is fully or nearly integrated into the design with associated performance counters. Much debug went into the correctness of the design and a change to the forwarding method was made to incorporate the unique case of forwarding rs2 to a store instruction.

Roadmap: Who's doing what?

## Alex:

Will continue to refine the prefetching scheme, work towards full integration of optimizations, and work on the final documentation. Specifically, will gather performance metrics and counters for presentation and final report.

#### Mohan:

Will work on (1)finishing the implementation of wallace multiplier (2)integrating multiplier and divider with the processor, including stalling/waiting for mul/div to finish, muxing multiplier inputs for data forwarding, handlers all 8 instructions, and so on. The short-term plan is to fully integrate add-shift multiplier and divider with the processor. After that, continue developing the wallace multiplier and substitute the add-shift multiplier with it.

## Peter:

Will work to fix the 4-way cache, full integration of optimizations, and final documentation.